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It  is  far  from  clear  how  best  to  approximate  one  random  vector  (for 
example,  a  stochastic  process  observed  at  n  consecutive  time -points)  by 
another  random  vector  which  may  have  for  example  a  simpler  stochastic  struc¬ 
ture.  This  problem  is  illustrated  by  seeking  constructions  of  points  P  and 

Q  uniformly  distributed  over  concentric  square  and  circle  respectively  and 

2  k 

of  unit  area,  so  as  to  minimize  D^=  E | P-Q |  and  D2=  (E | P-Q |  )’  . 
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Tlfli  CLOSENESS  OF  UNIFORM  RANDOM  POINTS 
IN  A  SQUARE  AND  A  CIRCLE 


D.  J.  Daley,  Australian  National  University 


1.  STATEMENT  OF  PROBLEM 


Let  the  points  P  and  Q  be  uniformly  distributed  over  a  unit  square  S 
and  a  circle  C,  concentric  with  S  and  having  the  same  unit  area.  What 

2  k 

joint  distribution(s)  for  P  and  Q  minimize(s)  D^h  E|P-Q|  and  D2=(E|P-Q|  y 
( J  •  J  denotes  Euclidean  distance.) 
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2.  ORIGIN  OF  PROBLEM 

Given  distribution  functions  (d.f.s)  F  and  G  of  R^ -valued  random  var¬ 
iables  (r.v.s),  there  exists  a  mapping  in  terms  of  a  r.v.  U  uniformly  dis¬ 
tributed  on  (0,1)  such  that  r.v.s  X*  and  Y*  have  F  and  G  as  their  d.f.s 
and  E|X*-  Y*|  and  E|X*-  Y*|^  are  least:  the  mapping  (X*,Y*)  =  (F"^(U),  G"*(U)) 

suffices  for  both  minimization  problems.  It  is  essentially  unique  for  min- 
2 

imizing  E|X*-  Y*|  ,  but  need  not  be  unique  for  minimizing  E|X*-  Y*|. 

2 

Considering  the  analogous  problem  for  R  -valued  r.v.s,  the  optimal  strat¬ 
egy  is  no  longer  clear.  The  strategy  is  needed  in  approximating  one  sequence 
of  r.v.s  by  another,  in  that  the  approximating  sequence  should  be  close  to 
the  original  sequence,  especially  when  the  r.v.s  are  structural  elements  in 
a  stochastic  process. 

3.  PARTIAL  SOLUTION  OF  PROBLEM,  1:^ 

For  definiteness,  let  the  square  S  have  vertices  at  (±2 ’^,0),  (0,±2 "^), 
so  that  the  circle  has  centre  at  the  origin  and  radius  tt  .  Symmetry  consider¬ 
ations  show  that  it  is  enough  to  consider  1-1  maps  of  P  and  Q  which  lie  in  the 

o 

45  wedge  0  s  y  s  x  intersecting  S  and  C  respectively. 

In  minimizing  D^,  it  is  asserted  first  that  it  is  enough  to  consider  map¬ 
pings  T:  S  -*■  C  for  which  Q  =  TP  has  P  =  Q  if  P  and  Q  are  in  the  coninon  part  of 
S  and  C,  and  hence  region  A  is  then  mapped  1-1  into  region  B  (see  Figure  1). 

For  suppose  not;  let  Sq  be  the  measurable  subset  of  A  that  is  not  mapped  into 

B  (Sq  is  measurable  because  the  mapping  T,  being  defined  via  r.v.s,  is  measur¬ 
able).  Using  X(-)  to  denote  Lebesgue  measure,  if  X(Sq)  *  0,  then 
Pr(P:  TP  e  B)  =  Pr{P  c  A}  and  there  is  nothing  more  to  prove.  Otherwise,  set 

CQ“  TSq,  and  let  S1  be  the  part  of  CQ  (if  any)  that  is  not  mapped  into  B. 
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If  X(S1)  -  0,  then  Pr{TP  or  T  P  «  B|P  e  A}  ■  1,  and  we  can  consider  the  map¬ 
ping  T*  defined  by 


r 


T*P  -  J 


Q  (P  \  A) 

^^P  (P  €  A) 

where  n(P)  =  inf{n:  T^P  e  B}  (P  e  A);  when  A(Sj)  ■  0,  n(P)  =  1  or  2.  If 
A(S^)  >  0,  continue  sequentially  defining  Sn+^  equal  to  the  subset  of 
Cn=  TSn  which  is  not  in  B.  Since  all  {S - :  j  *  0,1,...}  are  disjoint  (because 

oo 

the  mapping  is  1-1),  and  1  =  A(S)  >  \  A(S.)>  we  conclude  that 

j»0  J 

Pr{P  e  A:  n(P)  <  «,}  .  x(A). 

Consequently,  given  any  P  e  A,  we  can  a.s.  find  points  P^=  T^P  (j  <  n(P)), 
q  s  t^^P,  such  that 


|P  -  T*P |  <  |P  -  Px|  ♦  |PX-  P2|  ♦  |Pn(p).r  Q|, 

and  therefore,  for  any  mapping  T  that  does  not  map  A  into  B,  there  is  a  mapping 
T*  taking  A  into  B  and  setting  T*P  =  P  for  P  {  A  for  which  E|P  -  T*P | 

<  E|P  -  TP | . 

We  assert  next  that  for  airy  P^,P2  in  A,  the  optimal  mapping  T  cannot  have 
the  two  straight  line  segments  [Pj.TPj],  [P2,TP~J  intersecting,  for  if  they 
did,  the  triangular  inequality  again  shows  that 


lpi‘  TP2I  +  IV  ^ll  *  IP1‘  TPlI  +  lP2_TP2l- 

To  describe  a  mapping  having  such  a  property,  construct  tangents  to  the 
regions  A  and  B  as  indicated  by  the  dotted  lines  in  Figure  1,  intersecting 
in  M  say.  With  rays  centered  on  M,  sweep  out  segments  of  the  rays  through 
B  and  A  so  that  equal  areas  of  B  and  A  are  cut  off  by  corresponding  segments, 
and  map  points  of  one  segment  onto  the  other  in  proportion  to  the  areas  of 
the  two  truncated  cones  formed  by  the  ray  segments  and  differentially  per¬ 
turbed  ray  segments. 
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I  believe  this  mapping  as  just  outlined  is  optimal,  though  a  complete 

proof  is  lacking.  Whether  there  exists  a  unique  optimal  mapping  (strictly, 

an  equivalence  class  of  such  mappings,  since  any  mapping  may  be  altered  on  a 

set  of  zero  Lebesgue  measure),  I  do  not  know  either.  What  is  easy  to  show 

(and  tractable  to  compute) ,  by  rotation  of  axes  so  that  one  is  then  parallel 

A  B 

to  the  line  through  the  centroids  y  and  y  of  A  and  B  respectively,  is  that 

Dx  >  |yA-  yB|  Pr{P  t  Q). 

Under  the  mapping  as  described, 

Pr{P  +  Q}  =  8.%(ir'1arccos(irV2)  -  4'1)*) 

=  4tt" 1arccos (tt*/2)  -  (4ir‘1-  1)*  *  .090546. 

By  further  calculation,  yA  =  (.606448,  .032890),  yB  =  (.441662,  .301939),  so 

Dj  >  .028567  . 

It  is  also  worth  remarking  that  any  mapping  that  identifies  P  and  Q 
inside  the  common  part  of  S  and  C,  yields 

D0  =  inf  lim  E|P  -  Q|a  -  Pr{P  +  Q)  . 
maps  a+O 

4.  PARTIAL  SOLUTION  OF  PROBLEM,  II : D2 

It  will  be  convenient  to  retain  the  same  axes  as  in  section  3,  but  to 
write  r  for  the  radius  of  C:  later  we  shall  consider  circles  of  different 
radii.  Let  W,Z  be  independent  r.v.s  uniformly  distributed  over  (0,1),  and 
let 

P  -  (Z(W/2)*,  (l-Z)(W/2)*), 

Q  «  (rW*  sin  V>tZ,  rW*  cos  %itZ) 

It  can  be  verified  that  P  and  Q  are  then  uniformly  distributed  over  the 
first  quadrant  in  the  square  S  and  a  concentric  circle  of  radius  r.  Further, 
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the  mapping  corresponds  to  letting  a  radial  am  sweep  out  equal  areas  from 
the  y-axis  in  a  clockwise  direction,  being  a  proportion  Z  of  the  total  area 
of  each  region  of  S  and  C  in  the  quadrant,  and  then  mapping  the  points  on 
each  am  into  one  another  in  proportion  W*  of  their  respective  total  lengths. 

It  is  conjectured  that  this  mapping  minimizes  D2;  if  so, 

D2  =  E[W(Z/2*  -  r  sin  %itZ)2  +  W((l-Z)/2*  -  r  cos  %ttZ)2] 

=  h  E[£(Z2+(1-Z)2)  +  r2  -  2*r(Z  sin  %irZ  +  (1-Z)  cos  %ttZ)] 


With  r  =  tt  ,  this  expression  equals 

F+2tF'  (f)5/2  *  *002451  *  (.04951) 2 


while  choosing  r  so  as  to  minimize  the  mean  square  distance,  i.e.,  putting 
-2 

r  =  (4/2) n  ,  the  mean  square  distance  equals 
“  ~  *002411  . 

TT 

Certainly  we  must  have 

D2  s  .04951, 

s  s  c  c  sc 

while  by  writing  P-Q*P-pg+pg‘pg+pg-Q  where  pg  and  pg  are 
the  centroids  of  S  and  C  in  the  wedge  0  sy  s  x,  we  must  have 

D2  ^  )pg  -  pg)  =  .02693  . 


By  considering  the  motion  of  points  on  the  perimeter  of  the  circle  under 
the  mapping  described  in  this  section,  it  can  be  deduced  that  the  mean  distance 
moved  is  at  least  as  large  as 


1  ,  fa  ,  ,  ir/4  , 

tCt)  /  0r‘*  cos  0  -  *)  d6  ♦  2*  J  (*  -  it** 
0  a 


cos  0)  d0  s  .03458 


where  a  *  arccos(7r/2)  « 


On  the  other  hand,  any  mapping  that  leaves  P  inside  SnC  invariant  has 
(E|P-Q|2)*  >  |yA-  yB| (Pr{P  t  Q})*  s  .094936. 

5.  CONCLUDING  REMARKS 

The  argument  showing  that  an  optimal  mapping  as  measured  by  (or  Dq) 

leaves  invariant  points  in  the  common  part  of  S  and  C,  extends  to  any  two 

bounded  sets  in  of  the  same  d~dimensional  volume.  The  other  property  of 

i  map  of  the  non- intersection  of  line-segments  [P^.TP^]  and  [P2,TP2J  ex- 

2 

tends  to  other  figures  in  R  of  the  same  area:  the  higher -dimensional  analogue 
is  harder  to  visualize. 

Note  that  the  suggested  (class  of)  optimal  mappings,  leaving  P  invariant 

inside  SnC,  has  some  suggestions  of  the  probabilistic  notion  of  coupling  of 

stochastic  processes  on  discrete  state  space,  or  of  yielding  the  variation 

metric  of  two  probability  distributions. 

There  may  well  be  a  physical  principle  underlying  the  mapping  which  is 

suggested  as  minimizing  D2:  connected  regions  in  S  remain  connected  when 

mapped  into  C.  (This  property  is  not  held  by  neighbourhoods  on  the  boundary 

of  A  interior  to  S  under  the  mapping  of  Section  3  for  D^.)  Observe  that  if 

k  k 

P  =  (x,y),  then  Q  =  (r2^(x+y)  sin(%rrx/(x+y) ,  r2  (x+y)  sin(%iry/(x+y))) ,  which 
is  not  an  analytic  function  of  x  +  iy.  The  mapping  has  the  obviously  appeal¬ 
ing  property  of  being  defined  for  the  class  of  similar  circles  S.  And  its 
higher-dimensional  analogue  can  also  be  visualized:  map  the  surface  of  an  or- 
thant  onto  the  surface  of  the  hyper- sphere  optimally,  and  the  rest  is  scaled  by 
the  d-th  root  of  the  radial  distance;  the  harder  part  is  to  determine  the 
mapping  of  the  surfaces. 

I  thank  several  colleagues  at  UNC  for  discussion,  particularly  Gordon 
Simons,  Ross  Leadbetter,  and  Clayton  Bromley. 
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